Xception: Software Fault Injection and Monitoring in Processor Functional Units
نویسندگان
چکیده
1 This work was supported by Esprit project 6731 FTMPS “Fault Tolerant Massively Parallel Systems” Abstract This paper presents Xception, a software fault injection and monitoring environment. Xception uses the advanced debugging and performance monitoring features existing in most of the modern processors to inject more realistic faults by software, and to monitor the activation of the faults and their impact on the target system behaviour in detail. Faults are injected with minimum interference with the target application. The target application is not modified, no software traps are inserted, and it is not necessary to execute it in special trace mode (the application is executed at full speed). Xception provides a comprehensive set of fault triggers, including spatial and temporal fault triggers, and triggers related to the manipulation of data in memory. Faults injected by Xception can affect any process running on the target system including the operating system. Sets of faults can be defined by the user according to several criteria, including the emulation of faults in specific target processor functional units. Presently, Xception has been implemented on a parallel machine build around the PowerPC 601 processor running the PARIX operating system. Experiment results are presented showing the impact of faults on several parallel applications running on a commercial parallel system. It is shown that up to 73% of the faults, depending on the processor functional unit affected, can cause the application to produce wrong results. The results show that the impact of faults heavily depends on the application and the specific processor functional unit affected by the fault.
منابع مشابه
Xception: A Technique for the Experimental Evaluation of Dependability in Modern Computers
An important step in the development of dependable systems is the validation of their fault tolerance properties. Fault injection has been widely used for this purpose, however with the rapid increase in processor complexity, traditional techniques are also increasingly more difficult to apply. This paper presents a new software implemented fault injection and monitoring environment, called Xce...
متن کاملOn the Extension of Xception to Support Software Fault Models
Software faults are recognized as the major cause of system outages. The two possible approaches to overcome this problem are fault avoidance and fault tolerance. Quality assurance techniques fail to attain the zero defects mark, making fault tolerance vital to assure mission and business critical systems dependability. One major issue is the difficulty in the verification and validation of sof...
متن کاملOn the Emulation of Software Faults by Software Fault Injection
This paper presents an experimental study on the emulation of software faults by fault injection. In a first experiment, a set of real software faults has been compared with faults injected by a SWIFI tool (Xception) to evaluate the accuracy of the injected faults. Results revealed the limitations of Xception (and other SWIFI tools) in the emulation of different classes of software faults (abou...
متن کاملXception fault injection and robustness testing framework: a case-study of testing RTEMS
Xception is an automated and comprehensive fault injection and robustness testing environment that enables accurate and flexible V&V (verification & validation) and evaluation of mission and business critical computer systems and computer components, with particular emphasis to software components. In this paper we focus on the new robustness testing features of Xception and illustrate them wit...
متن کامل